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Abstract 

Given a graph of interactions, a module (also called a community or cluster) 
is a subset of nodes whose fitness is a function of the statistical significance 
of the pairwise interactions of nodes in the module. The topic of this pa- 
per is a model-based community finding approach, commonly referred to as 
modularity clustering, that was originally proposed by Newman j25| and has 
subsequently been extremely popular in practice {e.g., see [ll, HjET 



Various heuristic methods are currently employed for finding the optimal so- 
lution. However, as observed in [H, the exact computational complexity of 
this approach is still largely unknown. 

To this end, we initiate a systematic study of the computational com- 
plexity of modularity clustering. Due to the specific quadratic nature of the 
modularity function, it is necessary to study its value on sparse graphs and 
dense graphs separately. Our main results include a (l-|-£)-inapproximability 
for dense graphs and a logarithmic approximation for sparse graphs. We 
make use of several combinatorial properties of modularity to get these re- 
sults. These are the first non-trivial approximability results beyond the NP- 



hardness results in 10 . 
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1. Introduction 

Many systems of interaction in biology and social science are modeled as 
a graph of pairwise interaction of entities [2I, [sj . An important problem for 
these types of graphs is to partition the nodes into so-called "communities" or 
"modules" of "statistically significant" interactions. Such partitions facilitate 
studying interesting properties of these graph in their applications, such as 
studying the behavioral patterns of an individual in a societal context, and 
serve as important components in computational analysis of these graph. In 
this paper we consider the static model of interaction in which the network 
interconnections do not change over time. 

Simplistic definitions of modules, such as cliques, unfortunately do not 
apply well in the context of biological and social networks and therefore 
alternative definitions are most often used. In the "model-based" community 
finding approach, one first starts with an appropriate "global null model" Q 
of a background random graphic and then attempts to place nodes in the 
same module if their interaction patterns are significantly stronger than that 
inferred from the null model. The null model Q may provide, implicitly or 
explicitly, the probability pij of an edge between two nodes Vi and Vj. As 
an illustration, suppose that our input is an edge-weighted graph with all 
weights being positive and normalized between and 1. Then, if pij differs 
significantly from Wij, the weight of the edge between nodes Vi and Vj, the 
edge may be considered to be statistically significant; thus, if pij <^ Wij then 
it is preferable that Vi and Vj should be placed in the same module whereas 
if pij ^ Wij then it is preferable that Vi and vj should be placed in different 
modules. The standard {-|-, — }-correlation clustering that appears in the 
computer science literature extensively [sl, E2, 33 1 can be placed in the above 



model-based clustering framework in the following manner: given the input 
graph G with each edge labeled as -|- or — , let H be the graph consisting of 



^Of course, any clustering measure that relies on a global null model suffers from the 
drawback that each node can get attached to any other node of the graph; for another 
possible drawback see The purpose of this paper is not to debate on the pros and 

cons of model-based clustering. 
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all edges labeled + in G, pij = (resp. Pij = 1) if the edge was labeled + or 
missing (resp., labeled — ), the modularity of an edge is a^j- — Pij where aij 
is the entry in the adjacency matrix of H and the total modularity is 

a function of individual modularities of edges as induced by the clustering. 

In this paper, we investigate a model-based clustering approach originally 
introduced by Newman and subsequently studied by Newman and others in 



several papers [25|, 128|, |30|. The null model in this approach is dependent on 
the degree distribution of the given graph. Throughout the paper, by a set of 
communities ( or clusters ) we mean a partition S of the nodes of the graph 
and, except in Section l5J\ all graphs are undirected. 

1.1. The Basic Setup For Undirected Unweighted Graphs 

The basic setup for undirected unweighted graphs as described below 
can easily be generalized to the case of edge-weighted undirected graphs 
(see Section I^l3!) and edge- weighted directed graphs (see Section [5TT]) . Let 
G = (y,E) denote the given input graph with n = \V\ nodes and m = \E\ 
edges, let dy denote the degree of node v & V, and let A = denote the 
adjacency matrix of G, i.e., Qu^v = 1 if {u,v} G E and a^^v = otherwise. 
The null model Q for modularity clustering is defined by the edge probability 
function pu^v = for u,v G V with u = v being allowed; note that the null 
model provides a random network such that the expected degree of a node 
V is precisely dy. Intuitively, if a„^^ differs significantly from p^^y then the 
connection (or, the lack of it) is a significant deviation from the null model. 
Based on this intuition, the fitness of the community formed by a subset of 
nodes C C is defined a^ 

\u,vec ^ ^ / 

Then, a partition S = {Ci, C2, . . . , C^} of \^ has a total modularity of 

M{S) = J2 M(C.) (2) 



Notice that each distinct pair of nodes u and v contribute twice to the inside 



term a„^„ — in Equation ([T]). The goal is to find a partition (modular 



■^The y(2m) factor is for normalization purposes only to make the optimal objective 
value to lie between and 1. 
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clustering) S (with unspecified k) to maximize M(iS). Note that by allowing 
u and V to be equal in the inside summation, we provide a negative weight 
to every node. 

Let OPT = max M{S) denote the optimal modularity value. It is easy to 
verify that < OPT < 1. 

1.2. Brief History of Modularity Clustering and Its Applications 

The modularity clusterin g appro ach is extremely popular both in the 
context of biological networks 2o|, 32| as well as social networks [ll,[25, 28, 30 



However, as observed in not much was known about the computational 
complexity aspect modularity clustering beyond N P-completeness for dense 
graphs, though various heuristic methods have been proposed and empirically 



evaluated in publications such as ll|, |l5|, |3l| via methods such as finding 



minimum weighted cuts. For unweighted networks, it is known that OPT = 
if G is a clique, OPT = 1 — ^ if G is an union of k disjoint cliques each with 
n/k nodes, computing OPT is NP-complete for sufficiently dense graph^ 
and the above-mentioned N P-completeness result holds even if any solution 



is constrained to contain no more than two clusters 10 . 



1 . 3. Informal Summary of Our Results 

Unless mentioned otherwise explicitly, all algorithmic results apply for 
edge-weighted graphs and all hardness results apply for unweighted graphs. 

Hardness Results For dense graphs, namely for the complements of 3- 
regular graphs. Theorem 13. ll in Section l3TT] provides a (l+5)-inapproximability 
of the modularity clustering problem irrespective of whether the number of 
clusters is pre-specified or the algorithm is allowed to select the best num- 
ber of cluster^. The required approximation gap in our reduction is derived 
from the approximation gap of the maximum independent set problem for 



3-regular graphs in [ij]. The intuition behind our inapproximability result is 
that, for the type of dense graphs that is considered in our reduction, large- 
size cliques must be properly contained within the clusters. However, the 
gap preservation calculations need to be done extremely accurately to avoid 



'*The reduction roughly requires = fl {^/n ) for every node v. 
^The proof shows that e is roughly 0.0006. 
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shrinking the inapproximabihty gap|^. 

Lemma 12.11 in Section [2] shows, using probabihstic arguments, that small 
number of clusters well-approximate the optimal modularity value; in partic- 
ular, partitioning into just two clusters already achieves at least half of the 
optimum. Thus, it behooves to look at the complexity of the problem when 
we have at most two clusters, which we refer to as the 2-clustering problem. 
Theorem 14.11 in Section H] proves the N P-completeness of the 2-clustering 
problem for sparse graphs, namely for rf-regular graphs with any fixed d > 9; 



the previous N P-completeness result for this case in [lOj required the degree 
of every node to be large (roughly Q (y/n) ). Notice that we cannot anymore 
use the idea of hiding a large-size clique since the graph does not have any 
cliques of size more than d and, for fixed d, one can indeed enumerate all 
these cliques in polynomial time. Instead, our reduction is from the graph 
bisection problem for 4-regular graphs. Intuitively, now an optimal solution 
for 2-clustering is constrained to have exactly the same number of nodes in 
each community to avoid any local improvement. The ideas in the reduc- 



tion are motivated by the proof for this case in [10[, but we have to do a 
more careful reduction and analysis to preserve both the low-degree and the 
regularity of the resulting graph. 

Approximation Algorithms We first consider the case of sparse graphs. 
We show in Section 14.21 that a natural linear programming relaxation of 
modularity clustering has a large integrality gap, thereby ruling out this 
avenue for non-trivial approximationCi- Theorem 14.51 in Section 14.31 pro- 
vides a 0(log(i)-approximation for most (unweighted) d-regular graph {i.e., 
with d < jrtn)^ approximation that is logarithmic in the maximum 

weighted degree for weighted graphs provided maximum weighted degre^ is 
no more than about ^fn . It is easy to see that the modularity function is 
neither monotone nor sub-modular, thus we instead need to use semi-definite 
programming (SDR) techniques for maximizing quadratic forms. However, we 
face several technical hurdles in using SDP-based approximation algorithms 



^For example, the inapproximability gap of Berman and Karpinski in [9| does not sufRce 
for our purposes. 

^Interestingly, the proof shows that d-regular expander graphs have small modularity 
values (« ^/Vd). 

^ As noted in Section 14. 3[ we normalize all the weights such that their sum is exactly 
twice the number of edges. 
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for quadratic forms in [5|, Ig, ll3| : the coefficient matrix has negative diagonal 
entries and the lower bounds (hence the approximation ratios) in js], [g, 
depend on the number of nodes and not on the degree. Thus, our proof pro- 
ceeds in two steps. In the first step we obtain a lower bound on the optimal 
modularity value as a function of the degree or the maximum weighted degree 
using an explicit graph decomposition. In the second step, we show that the 
SDP-based method for quadratic forms can be used to obtain an approxima- 
tion that is within a logarithmic factor of this lower bound in spite of the 
negative diagonal entries. 

For locally-dense weighted graphs {i.e., graphs in which every node has a 
weighted degree of VL{n) ) we observe in Section IX^ that one can get a solution 
within any constant additive error in polynomial time by a simple use of 
the regularity lemma. In view of our APX-hardness result for dense graphs 
described before, this is perhaps the best polynomial-time approximation one 
could hope for. 

Directed weighted Graphs In Section 15.11 we show that all the hardness 
and approximation results for undirected weighted graphs can be extended 
to similar results for directed weighted graphs. 

Alternative Objectives and Null Models There are two natural ob- 
jections to Newman's modularity clustering: approximate solutions provably 
tend to produce many trivial (single-node) clusters and the background null 
model could be different Motivated by these observations, we consider two 
variations of the original modularity measure, one in which the modularity of 
the network is the minimum (instead of sum) of the modularities of individ- 
ual clusters and the other in which the null model is the classical Erdos-Renyi 
random graph. Our results show that the minimum objective provides simi- 
lar optimal modularity values as the original sum objective without allowing 
small clusters, and the Erdos-Renyi random graph null model is equivalent to 
Newman's modularity clustering in an appropriately defined regular graph. 



^The idea of using alternative null models has been explored before by some re- 
searchers [3, [2^; in particular, Karrer and Newman 23 1 showed that the scale- free null 
model provided by linear preferential attachment do not provide a new null model. How- 
ever, the focus in all these results was mainly to empirically compare null models using 
simple algorithms based on greedy approaches without provable approximation guarantees. 
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1.4- Comments on Our Results 



Relationships to previous approximation algorithms for quadratic 
forms The special case of partitioning the nodes into two clusters only can be 
written down as maximizing a quadratic form. However, none of the existing 
approximability results for quadratic forms apply directly to our case. In 
particular, the 0(logn)-approximation in j^, 13| is not applicable since the 



diagonal entries of the resulting constraint matrix are negativqlj, results such 



as m 



21| do not apply since the constraint matrix is not necessarily a positive 
semi-definite matrix and the 0(l)-approximations of [gl via Grothendieck's 
inequality do not apply since the quadratic form does not induce a bipartition 
of variables. 

Possibility of logarithmic approximation without degree constraints 

Our logarithmic approximations require some bound on the maximum degree 
of the given graph. A natural question is of course if such degree bounds can 
be removed. Two observations regarding this are relevant: 

^ A technical difficulty that arises for this purpose is from the fact that 
the modularity value can be precisely (such as when the given graph is 
Kn^n OT a graph obtained from by removing polylog(n) edges) or 
arbitrarily close to (such as when the given graph is the complement of small 
degree graph). Thus, at the very least, a non-trivial approximation without 
such degree bounds would require an efficient polynomial-time computable 
characterization of the topology of graphs whose modularity values can be 
arbitrarily small together with a special algorithmic approach to handle these 
graphs; approaches using quadratic forms or the regularity lemma do not 
suffice in this respect. 

^ The negative weights of the nodes start playing a more crucial role in the 
value of modularity when it is close to 0. As observed by other researchers 
before, negative diagonal entries in the coefficient matrix of the objective that 
shifts the objective value close to are sometimes difficult for approximate. 

Relationships to other clustering or partitioning methods Modular- 
ity clustering can be defined by several equivalent equations, which may seem 



^''The negative diagonal entries are crucial in the modularity measure [l|,|26|. Moreover, 
they could be small or large depending on the graph, thus it is not possible to specify a 
priori bound on them. 
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to suggest at a first glance that combinatorially the problem may be either 
similar to (via Equations ([T]) and ([2]) ) some form of correlation clustering, or 
(via Equation ([5]) ) similar to graph bisection (for two clusters), or similar to 
minimum i-way cut/ clique-partition type of problem (for arbitrary number 
of clusters, depending on whether the graph is unweighted or weighted), or 
similar to (via Lemma I2.2p some type of dense subgraph problem. However, 
our results show both similarities and differences between modularity clus- 
tering and these problems. For example, our hardness result for dense graphs 
should be contrasted with other partitioning problems of similar nature, such 
as MAX-CUT, graph bisection, graph separation, minimum £-way cut and 
some versions of correlation clustering, for which one can design a PTAS 
(e.g., see BBQ). 



2. Basic Results on Partitioning into Fewer Clusters 

In this section we show bounds on OPT as well as some useful properties 
of the solution if we restrict the number of clusters to some pre-specified 
value k; we will refer to this as the k-clustering problem. The objective 
function M{S) can be equivalently represented (via algebraic manipulation 



as observed in [10|, |25|, |28|, |30|) as follows. Let rrii denote the number of 
edges whose both endpoints are in the cluster Cj, rriij denote the number of 
edges one of whose endpoints is in Ci and the other in Cj and Di = J^v&d 
denote the sum of degrees of nodes in cluster Cj. Then, 



Since ^^gy {'^u,v — ^fm') ~ ^ u & V, we can alternatively express 

M(C) as 



(4) 



This, along with Equation ([3]), this gives us the following third equation of 
modularity (note that now each pair of clusters contributes to the sum in 
Equation ([5]) exactly once): 
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Let OPTyfc denote the modularity value of an optimal clustering when one is 
allowed at most k clusters. 

The following two lemmas make use of the alternative formulations de- 
scribed above. The first lemma asserts, via a probabilistic argument, that 
the optimal value does not go down by too much in our restricted setting. 

Lemma 2.1. For any k > 1, (l - i) OPT < OPT^ < 1 - ^. 

Proof. The inequality OPT^ < 1 — can be proved as follows. For any 
clustering S with at most k clusters. Equation ([3]) gives M{S) = Yl'i=i ^ " 
^i=i (im)^- "^^^ ^^^^ ^^^^ equation is upper-bounded by 1. Using 

Cauchy-Schwarz inequality, we get k Yli=i — (Yli=i ' giving a lower- 
bound of Yfc for the second sum. 

The inequality (l - i) OPT < OPTfc can be proved as follows. For /c = 1, 
the statement is trivially true. Now consider k > 1. We will make use of 
Equation for modularity values. Suppose that our optimal clustering S 
has more than k clusters. Denote each term in the summation of Equation ([5]) 
by I.e., M,, = ^ - thus OPT = M(5) = E»<i M^,. We can 
randomly assign each of the clusters to one of k superclusters. Let Itj be 
the indicator random variable of the event Ct and Cj are in different clusters 
and let Sk denote the random /c-clustering. It is easy to see that any pair 
Ci and Cj will contribute Mjj to the final clustering if and only if they are 
not in the same supercluster. Therefore, M(iSfc) = J2i<j ^ij^ij- Thus we get 
OPT, > E[M(5,)] = E.<, E[/.,]M,, = E.<, (1 - l) M.. = (1 - i) OPT. □ 

The next lemma shows that the 2-clustering problem can also be alter- 
natively viewed as a special kind of "subgraph selection" problem. 

Lemma 2.2. Let Vi and V2 be any partition ofV. Then, M(Vi) = M(V2). 

Proof. Remember that, for any node u, Xltje^ {^u,v — = 0. Thus, 



msVi v&v ^ ^ weVi V&V2 ^ 

= E E - 1^) = + E E 



dud J] 

2m 

dudy 
2m 



and therefore M{Vi) = M{V2). □ 
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3. Results for Dense Graphs 



3.1. APX-hardness 

This hardness resuh may be contrasted with the resuhs in Section 13.21 
where we show that the modularity value can be approximated to within 
any constant additive error for dense graphs using the regularity lemma. 
However, the APX-hard instances here have modularity values that are very 
close to (around Yn), thus the constant additive error provides no guarantee 
on the approximation ratio. 

Theorem 3.1. It is NP-hard to approximate the k-clustering problem, for 
any k, on {n — A)-regular graphs within a factor of 1 + e for some constant 
5 > 0. 

Proof. We reduce the maximum-cardinality independent set problem for 3- 
regular graphs (3- MIS) to our problem. An instance of 3- MIS consists of a 
3- regular graph H = {V,E), and the goal is to find a maximum cardinality 
subset of nodes V G V such that every pair of nodes u and v in V is 
independent, i.e., {u,v} ^ E. For notational convenience, let Si = ^^194 and 
= ^Vi94. The following inapproximability result is known for 3-MIS. 



Theorem 3.2. [IJ] For any language L in NP, there exists a polynomial-time 
reduction such that given an instance I of L produces an instance of H of 
3-MIS with n nodes such that: 

• if I & L then H has a maximum independent set of cardinality at least 

• if I ^ L then every maximum independent set of H is of cardinality at 
most 6e n. 

We start with an instance I of L and translate it to an instance if of 3-MIS as 
described in Theorem 13. 2( we refer to such an instance of 3-MIS as a "hard" 
instance. Given a hard instance H = {V,F) of 3-MIS with \V\ = n nodes 
and \F\ = edges such that a maximum independent set is of size either 

at most 5in or at least 5\in, consider the complement H = {V, F) of H, i.e., 
the graph with F = { {u,v} \u,v E V, u v}\F. Since H is 3-regular, H 
is (n — 4)-regular. The input to our 2-clustering problem is this graph H. 
For notational uniformity, we will denote the graph H hj G = (V, E) with 
E = F. Note that V G V is an independent set of H if and only if V is 
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a clique in G. Let ^ and OPT denote the size of a maximum independent 
set of H and the optimal modularity value of G, respectively. We prove our 
claim by showing the following: 

(completeness) If * > 5^n then OPT > -^—^ > . 

X TP T r , ^r.^ 4(5f - 1 0.9382 

(soundness) It W < o^n then OPT < < . 

^ > - ^ - n-A n-4 

For any subset C V C of nodes in G, let niv' be the number of edges in 
G with both end-points in V and Dy' be the sum of degrees of nodes in V 
in the graph G, i.e., Dy = Zliiev ^■v- 

3.1.1. Proof of Completeness > 6hn) 
Lemma 3.3. //* > 5^71 then OPT > ^'-^'^^ ~ ^"^^ 



(n-4) 



Proof. Suppose H has a has an independent set V with \V'\ = tn for some 
t > 5h- Since V is a clique of G, it follows that 2my' = tn{tn — 1) and 
Dyi = tn{n — 4). Consider the solution S = {V, V \V'} of 2-clustering on 
G. Using Lemma 12.21 and Equation (jS]) we get 



M{S) = 2M{V') 



m 




_ 2tn{tn-l) _ 2{At'-t) ^ 2(46^,^-6^) ^ 

n{n — 4) n — A ~ n — 4 

3.1.2. Proof of Soundness < d^n) 

Case I: when an optimal solution has exactly 2 clusters. 

Suppose that the optimal solution is 5 = jV, V \ V"'} of 2-clustering on G 
with \V'\ = tn and < t < 1/2. 

Lemma 3.4. Let an be the size (number of nodes) of a largest size clique 
in the node-induced subgraph G' = {V, E') where E' = {V' x V') HE. Then, 

n-4: 



11 



Proof. Since the size of the largest chque in G' is a n, for each of the remaining 
(t — a)n nodes, they will not be connected to at least one node inside the 
clique. Hence, using Equation (|3]), we get 

^ ' m \2m j ~ Mnzf) n-A 

Lemma 3.5. MiV) < — — ^. 

Proof. Using the previous lemma and the facts that a < min|t,5£} and 
t < y2, we have two cases: 

Case 1: t > Se. Then M{V') < The function fit) = At^ - 3t is 

increasing in the range {6e, 1/2] since 6i > 3/8 and ^^ = 8^ — 3>0ift>3/8. 
Thus, max5,<i<v, /(t) = / (1/2) = -1/2, and thus M{V') < ^ < 
Case 2: t < Si. Since a < t and + 2a — 3t is an increasing function of 

^+2f-3t _ U^-t 
n— 4 n—4 

/(O) = and 



a, we have M{V') < = 4ti_t. The function f{t) = At^ - t satisfies 



dl^^^ J <0 ift< 1/^ 



dt \ > if 1/8 < t < 5^ 

2 ^ 2^ 1 

Thus, maxo<f<5„ /(t) = and we have M(V') < — < y'- ^ 

n — A n — A 

A6p - 1 



Finally, using Lemma I2l2l M{S) = 2M{V') < —, completing the sound- 
ness proof for this case. 

Case II: when an optimal solution has more than 2 clusters. 

For convenience of calculations, we would like to drop the ^ scaling term 
from Equation 1^. To this end, we define M''""(C) = n{n - A) M(C). Let 
S = {Vi, V2, . . . , Kn+i} be an optimal solution of modularity clustering that 
uses a minimum m > 1 number of clusters. Let \Vi\ = tin, and suppose that 
C l^' C is a largest clique of size ain in the graph {Vi, (V^ x V^) fl E). 
Note that < < min | tj, for all 1 < z < m + 1, ti = 1 and we 

need to show that M"'^'^(5) < {A6i - l) n. Let Vi denote V \ Vi. 
Lemma 3.6. M^""(\/i) < (4tf - ti)n. 
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Proof. M™'^(Vi) is maximized when the nodes in Vi form a chque. Thus, 

M--(\/,) < - l) iUn) + {Un - 1) iUn) = {U^ - U) n □ 

Corollary 3.7. If \Vi\ < "A then M'^^^'iVi) < 0. If W\ = + 5) n > "/4 

then M"^^^(yi) < {A5^ + 6)n. 

Lemma 3.8. Suppose that ti = ^ + 6 > ^ for some < 5 < 1/2 and Si is the 

size of a largest clique in (Vi, (Vi x Vi) fl E). Then, 



Proof. Note that 



V 



i — 5 < 1/2. Then by Lemma 



1 .V . /I 



W\V) = W^\V)< |4(^--5J +2a,-3(^--5) 
452 - 5 - i + 2a,; 1 n 

where the inequality follows from Lemma [3.41 if we replace Vihy Vi. Since 
^ V2, we have 

452 - 5 - i + 2ai = Atl -hti + l- 2ai < Atl + 2ai - 3ti 

Since a, < 5^ < t,, the arguments in Lemma 13.51 can be directly applied on 
4t2 + 2ai - 3t, to show that (45^ _ 5 _ i + 25^) n < (25^ - |) □ 

Let us call a cluster a g'zani component if > 5^. Note that since 
3 61 > 1, we can have at most two giant components. We have therefore 
three cases depending on the number of giant components. 

Case (i): S has no giant components Note that S can have at most 
three clusters containing strictly more than "/4 nodes. 

If S contains no such cluster then by Corollary 13.71 M""'^(iS) < 0. 

If 5 contains exactly one such cluster, say Vi, then M"'^'^(iS) < M^"'^(Vi) < 
{26i - I) n < (4(5£-l) n by Lemma[33](if < 1/2) or Lemma[3l](if > 1/2). 
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If S contains exactly two such clusters, say Vi and V2, then again M"'^'^(iS) < 
M"'^«(\4) + M'"^''{V2) < 2 (2(5, - i) n = (45, - l) n by Lemma ESI and 
Lemma 13.81 

Otherwise, suppose that S contains exactly three such clusters, say Vi, V2 
and V3. Let ti = \ + 6i ioi i = 1, 2, 3. Then, < 5i + ^2 + ^3 < V4. Using 
Corollary 13.71 we have: 

J2M-m<UY,s^ + Y.^^]n< (4(^5. 

i=l V i=l 1=1 / \ \i=l 

1 \ 1 \ n 




Case (ii): 5 has one giant component Let Vi be the giant component. 
Since 1 — ti < 1 — 5, < 2/4, there are at most two other clusters with strictly 
more than "/4 nodes. 

Subcase (ii-a): there is one other cluster with strictly more than 
«/4 nodes Let this cluster be V2. By Corollary [X3 ^Jl^^ M"'^"(V,) < 0. 
Note that t2 < Se. Now, by reusing the calculations of Lemma [3.51 and using 
Lemma [3.81 we get 

m+l 

M"°^(5) = M"''^(V1) + M'"'%V2) + J2 M"'''(V;,) < M""^'^(V^i) + M"'^^(\/2) 



by Lemma 1X51 if ti > 1/2 by Lemma 1331 since t2 < 5i 
by Lemma [3.51 if ti < V2 

Subcase (ii-b): there are two other clusters with strictly more than 
"/4 nodes Let these clusters be V2 and V3. Then, 5, n < \Vi\ < "72. By 
Corollary jSZl M"'^^(y,) < 0. Let = i + ^2 and tg = i + ^3 with 

< 62 < S3 < I - 6e < 2/100. Thus, 

< (25, -^n + U5l + 52) n + (452 + 53) n 

u T ' i^^^r~ \ 1 / t)y Corollary [221 by Corollary ET] 
by Lemma 13.51 smce ii < 72 
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Since AS^ + S2 + ASj + S3 < 8(2/100)^ + 2 (2/100) < 25^ - |, we have M"^^(5) < 
(45, - 1) n. 

Case (iii): S has two giant components Let Vi and V2 be the two giant 
components with ti = 61 + fii and t2 = Si + fi2 for some < /ii < /i2 < 
1 - 25e. Since |u™+V,| = (1 - ti - ^2) « < (1 - 25,) n < ^/a, by Corollary O 
X^jlJs^ M'^^^iVj) < 0. Now, by reusing the calculations in the proof of the 
case of t > 6i of Lemma 13.51 and using Lemma 13.81 we get 

m+l 



< 



25, - - 



n 



25, - - 



i=3 



n 



(45, 



l)n 



□ 



by Lemma [3.81 if ti > ^2 by Lemma [3.81 if ^2 > V2 
by Lemma [3.51 if ti<^/2 by Lemma [3.51 if ^2 < V2 

5".^. Additive Approximations for Locally Dense Graphs 

Using the algorithmic version of the regularity lemma in 18j we can show 
that if the given graph is dense then, for any given constant a > 0, there is 
a polynomial-time algorithm that returns a solution of modularity value at 
least OPT - a. 

Proposition 3.9 (constant addiditive error). Suppose that the given graph 
G = {V,E) is dense, i.e., m = \E\ = 6n'^ for some constant < 5 < 1/2. 
Then, for any given constant < a < 1, there is a polynomial-time algorithm 
that returns a solution of value at least OPT — a. 



Proof. The i-waj cut problem is defined as follows. We are given an weighted 
graph G = {V, E) with w{u, w) G M being the weight of the edge {u, v} G E. 
A valid solution is a partition oiV to I subsets 5 = {S*!, 5*2, . . . , 5*,}, and the 
goal is to maximize the sum of weights of those edges whose end-points are 



in different subsets, i.e., maximize w{S) = 
{{u,v}\yi <j<i: I {u,v} nSjl 
The following result was proved in 



{u,v}&E{S) 



W[U,V] 



where E{S) 



2} is the set of all "inter-partition" edges. 



Theorem 3.10. ISj Given an weighted graph G = {V, E) ofn nodes and any 
constant < e < 1 there is a polynomial-time algorithm A^ which, computes 
a partition of V such that 
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where S* is an optimal (maximum weight) partition. 

Equation (jl]) can be used to assign edge weights to cast our modularity 
clustering problem as an £-way cut problem in the following manner. Con- 
sider the complete graph on n nodes (Kn) and let Wu,v = 26 — au,v) 
for the edge {u, v} of Kn- Then, for a partition S = {Si, 5*2, ... , Si} of the 
nodes of Kn, 

w{S) = Y.^s(^-(^u,v^ =2m6M{S) = 26VM{S) 

{u,v}eE{S) ^ ^ 

Let APXe be the objective value of an approximate solution of the modu- 
larity clustering problem on the given graph obtained by using the £-way 
partitioning of Theorem 13. 101 with e = 2 a 5^. Then, 

2S'^n^APX, > 26^n^OPT - en^ = APX, > OPT -a □ 



4. Hardness and Approximation Algorithms for Sparse Graphs 
4.I. NP -hardness 



Brandes et al. [10[ proved NP-hardness of the 2-clustering problem pro- 
vided nodes with very large degrees are allowed in the input graph. Thus 
it is not a priori clear whether calculating modularity on very sparse graphs 
becomes easy and admits an exact polynomial-time algorithm. However, we 
rule out this possibility of exact solution. Our construction is similar to 



that in [10|, but carefully replaces dense graphs with nicely behaving sparse 
graphs. We have to do a more careful analysis of the properties of an optimal 
2-clustering so as to get the following result. 

Theorem 4.1. Computing OPT2 is HP-complete even for d-regular graphs 
for any constant > 9. 

Proof. The decision version 2BdRegModularity of our problem is as follows: 

given a d-regular graph G and a number K , is there a clustering 
S of G into at most two clusters for which M(5) > Kl 

Our reduction is from the minimum graph bisection problem for 4-regular 
graphs ( M B4) : Given a 4-regular graph G with n nodes (with even n ) and an 
integer c, is there a clustering into two clusters each of^/2 nodes such that it 
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"cuts" at most c edges, i.e., at most c edges have two end-points in different 



clusters? MB4 is known to be NP-complete [2J]. We reduce an instance G of 
MB4 to an instance of 2BdRegModularity in a manner similar to that in 10 . 
Every node in G is replaced by a copy of an ra-node d-regular graph H such 
that the minimum cut (minimum number of edges in a cut) of H is at least d. 
Such a family of graphs can be constructed in the following recursive manner: 

• For d = 2, the 2-regular graph, namely a simple cycle consisting of n 
nodes, has a minimum cut of 2 edges. 

• For d = 3, consider two simple cycles Hi = (yi,Ei) and H2 = (V2, -E'2), 
each consisting of "/2 nodes. Consider an arbitrary matching between 
the nodes of Hi and H2 and add the edges corresponding to this match- 
ing to obtain a 3- regular graph H = {V,E). Consider an arbitrary 
subset of nodes V G V of H. Then, 

— If n Vi 7^ and V fl V2 7^ 0, then the number of cut edges is at 
least 4. 

— Otherwise, assume that V^' fl V^i = (the other case is symmetric) 
and thus C C V2. U V = V2 then the number of cut 
edges is exactly "/2 > 2. Otherwise, the number of cut edges is 
at least 2 (corresponding to two edges of the cycle in H2) plus 1 
(corresponding to one of the matching edges added). 

• For d > 3, a recursive construction of such graphs follows in a similar 
manner: take such a (d — 2)-regular graph H on n nodes for which the 
inductive hypothesis applies and add a simple cycle to H all of whose 
edges are different from those in H. Consider a cut in this graph. By 
the induction hypothesis the cut contains at least d — 2 edges of H and 
at least 2 additional edges of the new cycle added to H. 

Let H^ denote the copy of H corresponding to the node v G G. Delete two 
independent edges {i.e., edges without any common end-points) in Hy. The 
four edges connected to v are now connected to the four endpoints of these 
deleted edges. This is done in order to make the final graph G' d-ieg ulaS 



^^This is one step that is different from the reduction in lld |. where every node in G is 
replaced by a copy of Kn producing the final graph with non-constant degrees. Since G 
is 4-regular, we need d > 8. 
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Note that the number of nodes in the transformed graph G' is n^, whereas 
the number of edges is m = Since two edges are removed from H in the 
construction, the minimum cut in each modified copy of H is at least d — 2. 
The correctness of the reduction follows by showing that M B4 has a solution 
with at most c cut edges if and only if M(5*) >\ — -^. 
Let S* be an optimal clustering of G' . 

Lemma 4.2. S* has exactly two clusters and M(iS*) > 0. 

Proof. It suffices to show a clustering S = {Ci, C2} such that M{S) > 0. To 
this end, let Gi = {H^} for some v, and let G2 contain the rest. Then using 
Equation and the fact that d{n — 1) > 4, we get 

DA2m-Di) 4 dnidn'^-dn) 4 2rf(n - 1) - 8 
UiS) = — — = — ^—^^ --r^ = — — ^ > □ 

2 2 

The next lemma shows how to normalize a solution without decreasing 
the modularity value. Part (a) of the lemma states that S* cannot have any 
copy of H split across clusters, whereas part (b) implies that any optimal 
clustering has to be a bisection of the graph. 

Lemma 4.3. It is possible to normalize an optimal solution S* without de- 
creasing the modularity value such that the following two conditions hold: 

(a) For every v E G, there exists a cluster G E S* such that (1 G . 

(b) Each cluster in S* contains exactly "/s copies of H. 

Proof. Suppose the set of nodes of G' is partitioned into three subsets A, B 
and G. Let Si = {A U G, B}, and we want to transfer the nodes in G to the 
other cluster to form the clustering ^2 = {A,B U G}. For any two disjoint 
subsets X and Y of nodes of G', let mxY denote the number of edges one of 
whose endpoints is in X and the other in Y and Dx = Xliiex denote the 
sum of degrees of nodes in X. Then, using Equation (j3]) or Equation ([5]), 
the gain in modularity A = M(iS2) — M(iSi) can be simplified and written as 

A = - — -^^ — — H — —. Using the fact that G' is ci- regular and 

2m^ m 
substituting for m, we get 

dn ^ 

—A = d\G\{\A\-\B\)+ {niBc - niAc) (6) 
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(a) Let us assume that there exists a. v ^ G such that is spht across 
clusters in the optimal clustering S* = {Ci,C2}- Without loss of generality, 
we can assume that \Ci\ Hi,\ > \C2\ Hy\. We will transfer the part of 
in Ci from Ci to C2. Let A = Ci\ H^, B = C2, C = H^\ C2, and \C\ = k. 
Then the part of Hy in C2 has a size of n — k. By our assumption, 

\A\ - \B\ = \Ci \ H,\ - IC2I = |Ci \ H,\ - \C2 \ H,\ - \H, \ C2I > -{n - k) 

Substituting this in Equation (E]), we get 



A > d[-k{n - k)] +n {rriBc - rriAc) 



2 

Now, since the original graph G was 4-regular, at most 4 extra inter-cluster 
edges will appear after the transfer. Thus, ijiac < 4. The term niBc rep- 
resents the number of edges between G2 and Hy\Ci, which is at least the 
number of edges between the two parts of H^. Thus, rriBc is at least the 
number of edges in a minimum cut of which is at least d — 2. This gives 

— A > -dk{n -k) + n^{d -2-4) > - d — + {d- 6)n^ = > 

where the second inequality is due to the fact that k{n — k) is maximized 
when k = n/2, and the last inequality is satisfied when d > 9. Hence the 
modularity can be strictly improved by putting each copy of H completely 
in a cluster. 

(b) By the previous part, each is contained completely in one cluster of 
S* = {Ci,G2}- Now assume that Ci has more copies of H than C2. Since 
n is even, this implies that Gi has at least two more copies of H than C2. 
We will create a new clustering by transferring a copy of H from Gi to C2. 
Then the gain in modularity after this transfer is given by Equation (jS]), 
where G denotes the transferred copy of H, B = G2 and A = Gi\G. By 
our assumption, \A\ — \B\ > \G\. Therefore we can simplify the first term 
and get ^ A > (i|Cp + n'^{mBc — t^ac)- Also, since the original graph 
G was 4-regular, at most 4 extra inter-cluster edges will appear after the 
transfer. Simplifying and substituting values, ^ A > din? - Am? > 0. Hence, 
the modularity can be strictly improved by balancing out the copies of H in 
both clusters. □ 

Armed with the above lemma, one can now prove the NP-completeness 
of our problem. We will use the above construction to reduce an instance 
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(G,c) of MB4 to an instance {G',K) of 2BdRegModularity with K = \ - 
—. Now suppose S* = {Ci,C2} is an optimal 2-clustering of G'. Then, 
M{S*) = - By Lemma 1131(b), Di = D2 = m. Also, because of 
Lemma 14.3( a) , mu only has edges from G, thus representing a bisection of 
G. Therefore, mi2 < c if and only if M{S*) >\-^=K. □ 

4-2. Large Integrality Gap for an I LP Formulation 



maximize — — > 

2m ^-^ 2 m 

subject toM u ^ V ^ z: Xu,z < Xu,v + Xv,z 

Wu v. < Xu,v < 1 



Figure 1: LP-relaxation of modularity clustering [l|, [lO, 12 1. 

There is an integer linear programming (ILP) formulation of modularity 
clustering with arbitrarily many clusters as shown in Fig. [T] Xu,v = if m and 
V belong to the same cluster and 1 otherwise, and the "triangle inequality" 
constraints Xu,z < Xu,v + a;^,^ ensure that if {u, v} and {v, z} belong to the 
same cluster then {u, z} also belongs to the same cluster. Agarwal and 
Kempe [H used such an LP-relaxation with several rounding schemes for 
empirical evaluations. However, as we show below, the worst case integrality 
gap of the LP-relaxation is at least about the square root of the degree of 
the graph, thereby ruling out logarithmic approximations via rounding such 
LP-relaxations. 

Lemma 4.4. For every d > 3 and for all sufficiently large n, there exists a 
d-regular graph with n nodes such that the integrality gap of the LP-relaxation 
m Fig.\^is n{y/d). 

Proof. Let OPT/ be the optimal objective value of the LP-relaxation. For 
any graph G = {V,E), a valid fractional solution of the LP-relaxation is as 
follows: set Xu,v = ^ for every {u, v} & E and set Xu,v = 1 otherwise. The 
value of this fractional solution is precisely | —J2v&v Thus, in particular, 
if G is a d-regular graph then OPT j > | — ^. 

On the other hand, suppose that G is a random d-regular graph and let 
A be the second largest eigenvalue of the adjacency matrix A of G. It is 



20 



well-known that A < for some positive constant (5 17 1. Consider an 

optimal solution ^ d V d V oi 2-clustering of G with Q < \V'\ = an < n/2 
and let cvXiV') denote the number of edges between V and V \ V. By the 
expander mixing lemma, we have 

c\it{V j — d 



n 



— a)n 



= \cut{V') — a{l — a) dn\ < \^/a{l — a) n 

which implies cut(y) > a{l — a) dn — X^/a{l^^a) n > a{l — a) dn — l3\/dn. 
Let uncut(l^') denote the number of edges between pairs of nodes in V. 

Then, uncut (r) = "'^""^"^^^'^ < ""^^y^" . Using this in Equation 1^ 
(with m = dn/2) together with Lemma [2.11 and fI72\ shows 

^^^,^^2xuncut(n 



dn 

=^ OPT < 2OPT2 = 4M(r') < ^ =^ ^^^ = n{Vd) □ 
4-3. Logarithmic Approximation 



/adn^ 




\ dn J 




OPT J 




OPT 



Newman [27[ extended the modularity measure to weighted graphs in 
the following manner. Let G = {V, E, i) be the input weighted graph with 
i : E M"*" being the function mapping edges to non-negative real-valued 
weights. Now, if we redefine du = '^{uv}eE ^i'^^'") "weighted" degree 

of the node u, m = J2uev ^u, and A = [au,v] as the weighted adjacency matrix 
of G {i.e., au,v = (■{u,v) if {u,v} G E and otherwise), then Equation ([T]) 
applies to the weighted case also. The corresponding modification in Equa- 
tion on]) can be obtained by redefining rrti as the total weight of edges whose 
both endpoints are in the cluster Cj, the total weight of edges one of 

whose endpoints is in Cj and the other in Cj and Di = X],;ec 
of weighted degrees of nodes in cluster Cj. It is straightforward to see that 
Lemma 12.11 holds even for weighted graphs. 

We denote the weighted degree, the maximum weighted degree and the 
average weighted degree of a node v by dy, dmax = max„gy{(i^,} and A = 
^^sevjhi^ respectively, and, for convenience, we normaliz^ all the weights 
such that Ylv&v ^'^ twice the number of edges of G. 



^^It is easy to see that the modularity value of any clustering remains unchanged if all 
weights are scaled by the same factor. 
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Theorem 4.5. 

(a) There exists a polynomial time O (log d)- approximation for d-regular graphs 
with d < 771^ . 

2 mn 

(b) There exists a polynomial time O {log d^aa^)- approximation for weighted 
graphs rf^ax < 

Proof. We begin with the approximation algorithm for regular graphs, which 
is somewhat easier to analyze, and later generalize the results for weighted 
graphs. A common theme for both the proofs is the following approach. By 
Lemma [2.11 OPT9 > ^'^V^^ ^ind thus it suffices to provide a logarithmic ap- 
proximation for the 2-clustering problem on G. For notational convenience let 

Wuv = — As observed in 29|, letting x„ G { — 1,1} be the indicator 

2 m 

variable denoting the partition that node u & V belongs to. Equation ([2]) 
can be rewritten for a 2-clustering as M(iS) = Yliuv&v'^u,v{.^ + XuX^) = 
"l^uvev '^u,vXuXv = x'^W^x where x G { — 1,1}" is a column vector of the 
indicator variables and W = [wu,v] ^ ffi"^" is the corresponding symmetric 
matrix. The following result is known on quadratic forms. 

Theorem 4.6. 13| Consider maximizing x'^Zx subject to yi & {~1) I}"'; 



Fiji 



where Z = [zij] is a n x n real matrix with Zi^i > 0. Then, for any T > 1, 

there exists a randomized approximation algorithm whose objective value k 

, , maXxc/-i U'l x^Zx 7,2- / 

satisfies ^^Xk] > ^^^^j^ 8e"^/' ^ 

The above approximation does not directly apply to the quadratic form 
for modularity clustering since the diagonal entries are negative for our case. 
Moreover, the lower bound on the optimal value of the quadratic form as 



used in [13[ depends on n which we would like to avoid. 



(a) The Case When the Input Graph is Regular. 

The proof of the following lemma uses a result in 22|] on the size of 
a maximum- cardinality matching of a regular graph. The above lemma is 
tight in the sense that there exist d-regulai graphs for which OPT = O {^/Vd) 
(the proof of Lemma 14.41 shows that (i-regular expanders are one such class 
of graphs). 

Lemma 4.7 (Lower Bound for OPT). If n > 40rf^ then OPT > else 

OpT> 0|6_4 

a n 
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Proof. Consider a maximum-cardinality matching {ui, Vi}, . . . , {uk, Vk} of G 
of size k. It is known 22| that for any d > 2, 



k > 



. r n(rf2 + 4) n-1 
(d^ - d'^ -2)n-2d + 2 



which gives k > 0.43 n for any d. We create k clusters {Vi, V2, . . . , V^} where 
Vi = {ui,Vi} and for each remaining node u E V \ (u^Lj^V^) we create a 
cluster {u} of one node. Using Equation ([3]), we have 



M(5) = J2 



m \ 2m 



> 



dn 71^ J ^ n'^ d n 

1=1 ^ ' j=fc+i 



For fixed d and n > 40(i^, it was shown in [4] that every regular graph 
with n nodes has a bisection width of at most ^| — 0.13-\/rfj (|). Consider 
the partition 5 of G into two clusters C\ and corresponding to such 
a bisection with exactly ^2 nodes in each cluster. Then, m = = 

D2 = m, mi,m2 > ^| + 0.13 x ^/(tj (^) and using Equation we get 
M(Ci) = M(C2) > Consequently, by LemmaOM(5) > □ 

We now define the following quantities: 



W = [K,v] where w'^ 



u,v 



0, if ti = u 
Wu,v, otherwise 



Thus, if OPT2 = maxxe{-i,i}" x^VTx and OPT2 = maXxe{-i,i}n x'^W^'x then 
opt; = OPT2 - D. 



Lemma 4.8. W[„,,i < 2. 
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Proof. 



U,veV Wu,v>0 Wu,v<0 \Wu.v>0 



m 



since '^Wu,v = ^ Wu,v Wu,v 

U,V£V Wu,v>0 ■Wu,v<0 



□ 



Next, we bound D by observing that, for any d, D = = ^. To 
complete the proof, we use the algorithm in Theorem 14.61 with Z = W. 
Using Lemmas l2.lt 14.71 and 14.81 we get the desired approximation guarantees 
of Theorem 14.51 by choosing T = \/A\nd in the algorithm in Theorem 14.61 
Then we have the following chain of implications for all sufficiently large d 
and n: 

• OPT'a = OPT2 - D>^-D>^-i>^^-^>^^. 

■i ^ — 2 a n a 2d Inn a 



Thus " ' ^ ^ — r^fl 

OPT' ~ 



. Thus, E[k] >^- 4e-^ rfOPn = ?S - f opt; > 
Thus, the final modularity value achieved is at least 

opn ^ ^ 0PT2 - D _ ^ 



4.11nrf 4.11nrf 
OPT2 1 \ fOA 1 



4.11nd V 4:.l\nd J \ d nJ\d + OAn 



4.11nrf V A.llndJ \l + 0.8lnnj J 4.21nrf 8.41nrf 

(b) The Case When the Input Graph is Weighted 

Since the given graph can be assumed to be connected, A > 1 — -. We 
want to design an O (log rfmax)- approximation algorithm assuming c/max < 
j^^- Again, we first provide a lower bound for OPT. 
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(* S denotes the set of clusters *) 
(* initialization *) 

5 = 0; V" = V; E" = E' = { {u,v} \ {u,v} G ESzi{u,v) < 1/2} ; Vm G F: C„ = 
(* Algorithm *) 

while the graph {V", E") contains at least one edge do 

pick a node v G V" that maximizes L{v) = '^^uvjeE" ^i'^^'^) 
Cy = {v} U {u I {u, v} G E"} ; add the new cluster Cy to S 
V" = V" \ Cy ; E" = {V" X V") n E' 

endwhile 

for every v G V" do 

add the cluster {v} to S 
endfor 



Figure 2: Greedy algorithm for computing lower bounds for weighted graphs. 



Lemma 4.9 (Lower bound on OPT for weighted graphs), //c/j, 
1 

then OPT > 



< 



16 Inn 



8d^ 



Proof. We execute the greedy algorithm on G' as shown in Fig. O Note that 
the graph G' = {V,E') has a maximum weighted degree of precisely d 
The number of nodes adjacent to any node v in G' is at most 2dy < 2d, 



max) 
max) 



and i{E') = E|„,t,}e£' ^("' ^ 



m 



{u,v}eE\E' ' 



[u, v) > ™/2. 



Let L{Cy) = ^u,veCv Since the weight of any edge in E' is at 

least 1/2, it is easy to see that during each selection of cluster Cy, L{Cy) is 
at least y^max times the total weight of edges whose one end-point was in Cy. 

Thus, }2cM'^^) ^ 



dn 

all sufficiently large n, 



> 



2 {d. 



2 (rfmax + 1) 



Note that for 



n A 



n A 



> 



n A 



> 



n A 

\2 



> 



n^A^ 



> - 



2'nA 
1 



256ni-6 1n^nA2 



if {u,v} G -E 



otherwise. 
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Thus, for all sufficiently large n, we have 



M(5) = 5^M(C.; 



CvGS 
Wu,u > 



. U,VGCv . 

\{u,v}€E ) 



^ ('^max) 



2nA 



256ni-6 ln^nA2 



> 



2nA 



n {d^ 



nA 



512 In^ n A2 256 n^-^ In^ n A2 



> 



2{du 



-1) 



2nA 



512 nVs In^ 



n 



4K 



> 



512nV5ln^n Sd 



Since < ^ and A > 1 - i, D < #^ 



16 In n 



2{nAY 



J_ (d, 

In 



A — 



□ 



Selecting T = ■\/16 In dmax in Theorem I4.6[ we have the following chain of 
implications: 



OPT 

opt; = OPT2 - D > D 



Thus, < Mdr, 



1 



1 



16 d„ 



512^^5 In^n 



> 



1 



17 d^ 



OPT2 

• Thus, E[k] > - 34e-^ rf^ax 0Pr2 > j^^^ 

lYlnctm; 

and thus the final modularity value achieved is at least 

OPT2 ^ OPT 



171nd, 



0(lndn 



□ 



5. Other Results 

5.1. Modularity Clustering for Directed Weighted Graphs 



Leicht and Newman [25| generalized the modularity measure to weighted 
directed graphs in the following manner. Let G = (V, E, i) be the input 
directed graph with l: E ^ being the function mapping edges to non- 
negative weights. For a node v E V , let ci™ and denote the weighted 
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in-degree and the weighted out-degree of v, respectively. Let m = ^^gy d™ + 
'^v&v ^v^^ ^ ~ ['^u,v] denote the weighted adjacency matrix of G, 

i.e., au,v = i{u,v) if {u,v) G E and au,v = otherwise. Note that the 
matrix A is not necessarily symmetric now. Then, Equation computing 
the modularity value of a cluster C C \/ needs to be modified as 



\u,v£C ^ 



Jout Mn^ 

m 



With some effort, we show that we can extend all our complexity results for 
undirected networks to directed networks. Let A = — — — = — — — 



n n 

in _ Jin 

max 



denote the average weighted degree of nodes of G, and let rf^g^^ = maxd! 
and d^l-^ = max d°^^ denote the maximum weighted in-degree and maximum 
weighted out-degree, respectively, of nodes in G. For convenience, we nor- 
mahze all the weights such that J2vev ^v" + 12vev exactly twice the 

number of directed edges of G. Since the given graph can be assumed to be 
weakly-connected, A > 1 — ^. 

Theorem 5.10 

(a) Computing OPT2 is HP-complete even if every node v has rf™ = d"*^* = d, 
for any fixed d >9. 

(b) It is UP -hard to approximate the k-clustering problem, for any k, within 
a factor of 1 -\- e for some constant e > even if every node of the given 
directed graph has d]^ = d^^ = n — 4. 

(c) There is an 0{\ogd) approximation algorithm for unweighted directed 
graphs if the in-degree and out- degree of all nodes is exactly the same, say d, 

and d < -rrr^, — . 

— 100 Inn / / ■ 

(d) There is an O (log ((imax + "^maxj ) -approximation algorithm for weighted 
graphs provided max { <g^, rf^"^*^ } < 

Proof. Remember that 

HC) = -{y. (7) 

\u,vec ^ ^ J 



^■^We made no serious attempts to optimize various constants in this theorem. 
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The corresponding modification in Equation is 



C,G-S 



where Df = Xliiec, -^i'"* ~ SueCi ^^"^ total weight of edges 

whose both endpoints are in the cluster Ci. Finally, since Ylvev (^^u,v — ^^^^ 
~ Ylvev (^'^u,v — j = for any u E V, we can alternatively express 

1 / /^outJin \\ 

M(C) as M(C) = — J2 ( - ) • Thus, Equation (ED now 

becomes 



where the total weight of the edges directed from Ci to Cj. 

(a) &; (b) These two results follow by the following easy observation. Con- 
sider a given undirected unweighted graph G with n nodes and m edges, 
and let G be the directed graph obtained by replacing each edge {m, v} 
of G by two directed edges {u,v) and each of weight 1; thus m = 

Tlivev + Yliv&v ~ ^ ~ \0'u,v] be the adjacency matrix of G, 

and and be the in-degree and out-degree of the node v in G. Then, it 
is easy to see that every clustering of G of modularity value x translates to a 
corresponding clustering of G of the same modularity value and vice versa. 

(c) & (d) It is easy to see that the proof of Lemma 12.11 works for directed 
networks as well by using Equation (jHD instead of Equation (ED in the proof. 
Thus again it suffices to approximate OPT2. 

Let W = [wu,v] ^ IR"^" be the matrix whose entries are defined by Wu,v = 



m 



. Then, letting G { — 1,1} be the indicator variable denoting 
2m 

in which partition the node u E V belongs. Equation ([TD can be rewritten 
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for a 2-clustermg of directed networks as 

M{S) = ^Wu,v (1 + XuX^) = ^Wu,vXuXv 

U,V&V U,VS:V 

= x^I^x = x^ ( — ^ j X = x^W^'x 

where W = = [w'^ ^] is a symmetric matrix. Note that w'^ ^, 



5. 



u,v 



2m where 6u,v is given by: 



2m 

1, if both (m, v) & E and (t>, u) E E 

5u,v = Sv,u = 0) if both (m, v) ^ E and (f , u) ^ E 

1/2, otherwise. 



Let W = [w^v] be the real symmetric matrix defined by w^y 



0, if M = f 
wi, „, otherwise. 



As in the proof of Theorem 14. 5 [ it follows that "^uv&v^^^ ^ 2- nota- 
tional convenience, define D = trace (w — W'^ = J2uev'^u,u OPTj = 

max x^VTx. 

xe{o,i}" 

(c) G is an unweighted directed graph with rf™ = = d for every 
node V, and d < -rr—. 

' — 5 m n 

The proof of Theorem 14. 51 on the quadratic form maXxg{o,i}" x""" x gives 
an approximation factor of 7 In d, for some constant 7 > 0, for our directed 
network provided we can show that 

OPT' ^ ^ /OPT' , 

2 __i and 



7 Ind \7 lufi^ 

• OPT2 = VL {d~'^) for some constant c > 0. 

Let H be the undirected graph obtained from the given graph G by ignoring 
the direction of the edges and removing parallel edges (if any); every node in 
H has a degree between d and 2d. Greedily pick a maximal matching in if, 
each time selecting an edge and deleting all (at most Ad — l) edges that have 
a common end-point with the picked edge. Such a matching contains at least 
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— — edges, each oi weight at least — 5- = — — j m G. 



4rf 8 ' ^ Am Am? 8dn 2n^ 

Consider the clustering of G where each edge in the matching is a separate 

cluster of two nodes, and each of the remaining nodes is a separate cluster 

of one node. The modularity value of this solution is at least 

n 1 1 1 



- - trace (W -W] > 

\8dn 2v?) 8 V J - 



6Ad 16n 2n 



Thus, OPTn > — = Vtid Moreover, since d < -r^, — we have 

' ^ — 128 a i2n ^ ' ' — 100 mn 

opn ^ ^ OPT^ /opn 



In d In (i 2n 



V Inc/ ) 



(d) max {dj^^^, (i^^^} < 

Let G" = (V, E") be the undirected weighted graph obtained from G 

whose adjacency matrix is VT" = [w^ with w^^^ = I ^"'"^ ^' _ 

Since w'l^>w'^ „, it suffices to show an approximation for maXxg{o,i}" ^W"^. 
The algorithm in the proof of Theorem 14.7( b) with W = W" can now be 
appropriately modified to obtain the desired approximation if one identified 
the quantity rfmax in that proof with d^^^^ + (i^ax- ^ 

5.2. Alternative Modularity Measure: the max-min Objective 

Exact or approximate solutions to the modularity measure may produce 
many trivial clusters of single nodes. For example, the following proposition 
shows that for a large class of graphs there exists a clustering in which every 
cluster except one consists of a single node gives a modularity value that has 
a modularity value of at least 25% of the optimal. 

Proposition 5.2. There exists a clustering for a graph G in which every 
cluster except one consists of a single node and whose modularity value is at 
least 25 % of the optimal if 

• G is d-regular with d < 2]tn' 

• G is an undirected weighted graph with d^^^ < j^^- 
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Proof. Let |V^',l^ \ V'^ be an optimal 2-clustermg of G. By Lemma 12. 

OPT2 > By Lemma O M(r') = OPT2/2 = 0PT/4. Suppose that we 

replace the cluster V \V' by |^ \ trivial clusters each of a single node, 

and let C be this new clustering If G is regular, then M(C) = M{V') — D = 
OPT 1 T„^^„rr^ noT- ^ 0.86 4 „„j ^.k,,^ ^Afn\ _ opt 



4 By Lemma 1121 OPT > ^ - f , and thus M(C) = ^ - o(l). 

Similarly, for the case when G is undirected weighted with (imax < Je^: 
the proof of Theorem 14.51 shows that D < ^sj^ ^ , and thus M(C) = 

M(l^') - D > ^ STTT^- By Lemma Ml OPT > and thus 

again M(C) = ^-0(1). □ 

We investigate one alternative to overcome such a shortcoming: define the 
modularity of the network as the minimum of the modularities of individual 
clusters. Equation (|2]) now becomes 

M"^^^-"^'"(5) = minM(C,) 

We will add the superscript "max-min" to differentiate the relevant quantities 
for this objective from the usual summation objective discussed before, e.g., 
we will use OPT'^^^~'^'" instead of OPT. In a nutshell, our results in the 
following lemma show that the max-min objective indeed avoids generating 
trivial clusters (Lemma 15.3( a)). and the optimal objective value for max-min 
objective is precisely scaled by a factor of 2 from that of the SUM objective, 
thereby keeping the overall quantitative measure the same (Lemma 15.3( b)). 

Lemma 5.3. Let G be a weighted undirected graph with m edges and maxi- 
mum degree (imax- Then, the following claims hold: 



(a) No optimal solution for max-min objective has a cluster with fewer than 
,^Op-rmax-min ^^^^^^ 

-max-min _ OPT2 



(b) OPT^ 

Proof. 

(a) Since only an edge with positive weight can increase the modularity of a 
cluster, it is easy to check that a cluster with y nodes can have a modularity 
value of at most 

4m 

(b) Consider an optimal clustering S = {Vi, V2, . . . , V^} with a minimum 
number A; of clusters such that OPT"'^''""''" = M"^3^-"^'"(5) = mini<i<fc {M{Vi)} > 
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0. First, consider the case when k > 3. We will show that for some non-empty 
subset T of {V^i, V^2, • • • , Vk} we must have MiUv^^rVj) > M'^3^-'^'"(5); this 
contradicts the minimality of k in our choice of of the optimal cluster. Note 
that M{S) = J2i=i > k ■ M'^3^-'^'"(5). We will make use of Equa- 

tion of modularity of a cluster. Let M{S) = ^ i X^neVi.^ey, {au,v - ^) 

Then, M{S) = —M{S). Consider a subset T obtained by randomly and uni- 
formly selecting each Vi with a probability of ^/2. Note that each pair of 
nodes u and v belonging to the same cluster is selected with a probability of 
1/2, whereas each pair of nodes belonging to different clusters is selected with 
a probability of 1/4. Thus, 



E 



2 ^ 4 ~ 4 



and therefore there exists such a subset T with the properties as claimed. 

Otherwise, consider the case when k = 3. Let Mj^- = — ^ 

for i < j. Without loss of generality, let M(Vi) = a, M(V2) = a + b and 
MiVs) = a + c for some a > and 6 > c > 0; thus, M"^3^""^'"(>S) = a. 
Consider the three 2-clusterings of G: Ci = {Vi U V2, V3) , C2 = {V2 U ^3, Vi) 
and C3 = (Vi U V3, V2). Since none of these three 2-clusterings should be an 
optimal solution, we must have 

= min {2a + 6 + Mi,2, a + c] < a = Mi,2 < -(a + b) 



= min {2a + b + c + M2,3, a} < a = M2,3 < 



Mmax-min(^^) _ ^max-mm^^^ ^ q 

= min {2a + c + Mi_3, a] < a = Mi_3 < -(a + c) 
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Thus, we have M(Vi) + M(l^2) + M(V3) = 3a + b + c = -Mi,2 - M2,3 - Mi^g > 
3a + 2b + 2c which imphes b + c < 0, contradicting b > c > 0. 

Thus, we have shown there is an optimal solution for our max-min objective 
with no more than two clusters. Obviously, if qPT'^^^"'^'" > then an 
optimal solution cannot consist of a single cluster. Let Vi,V2 be the two 

clusters in this case. By Lemma \2.2\ we have M(Vi) = M(V2) which implies 
Qpjmax-min _ 0PT2 q 

5.3. Alternative Null Model: Erdds-Renyi Random Graphs 

A theoretically appealing choice for alternative null models is the classical 
Erdos-Renyi random graph model G{n,p), namely each possible edge {u,v} 
is selected in G uniformly and randomly with a probability of p for some 
fixed < p < 1. To summarize, our results in this section show that the 
new modularity measure is precisely Newman's modularity measure on an 
appropriately defined regular graph, and thus our previous results on regular 
graphs can be applied to this case. 

We will add the superscript "ER" to differentiate the relevant quantities 
for this objective from the usual summation objective discussed before, e.g., 
we will use OPT^'^ instead of OPT. For simplicity, we consider the case of 
unweighted graphs only. Let G = {V, E) be the given unweighted input graph 
with m = nA number of edges. Select p = such that the null model has 
the same number of edges in expectation as the given graph G. Equation ([1]) 
then becomes 

V-P) 



E 



M™(C) = i:*" 



2m 

Let n be sufficiently large such that p ^ (2A)/n. It can then be seen that 
M^'^(C) is precisely the same as M(C) on a (2A)-regular graph. Thus, our 
previous results on regular graphs can be generalized to this case in the 
following manner: 

• Computing OPT^'^ is NP-complete for graphs with A > 18. 

• If A < then the problem admits a 0(log A)-approximation. 
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